Model Selection

OCR enhancement

# OCR enhancement

A model focused on converting image content into text information, with wide application value.

TensorBoard Other

Aya Vision 8B is an open-weight 8-billion-parameter multilingual vision-language model supporting visual and language tasks in 23 languages.

Transformers Supports Multiple Languages

Internvit 300M 448px

InternViT-300M-448px is an efficient vision foundation model developed through knowledge distillation from InternViT-6B-448px-V1-5, featuring dynamic input resolution of 448×448 and supporting 1 to 40 patch processing.

Idefics2 8b Chatty

Idefics2 is an open multimodal model capable of accepting arbitrary sequences of images and text as input and generating text output. The model can answer questions about images, describe visual content, create stories based on multiple images, or function purely as a language model.

Transformers English

Internvit 6B 448px V1 5

InternViT-6B-448px-V1-5 is a vision foundation model fine-tuned based on InternViT-6B-448px-V1-2, featuring strong robustness, OCR capabilities, and high-resolution processing.

Internvit 6B 448px V1 2

InternViT-6B-448px-V1-2 is a foundational vision model with a feature backbone, comprising 55.4 million parameters, supporting image processing at 448x448 pixels.

Donut Base Payslips

Document understanding model based on Donut architecture, specifically fine-tuned for payslip image processing

Text Recognition

This model is an open-source model based on the MIT license, with a CER (Character Error Rate) of 0.0019, indicating high accuracy in specific tasks.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase